knitr::opts_chunk$set(echo = FALSE, message = FALSE)
library(Seurat)
library(ggplot2)
library(data.table)
library(MAST)
library(SingleR)
library(dplyr)
library(tidyr)
library(limma)
library(scRNAseq)
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.4
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] scRNAseq_2.2.0              limma_3.44.3               
##  [3] tidyr_1.1.1                 dplyr_1.0.2                
##  [5] SingleR_1.2.4               MAST_1.14.0                
##  [7] SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.2
##  [9] DelayedArray_0.14.1         matrixStats_0.56.0         
## [11] Biobase_2.48.0              GenomicRanges_1.40.0       
## [13] GenomeInfoDb_1.24.2         IRanges_2.22.2             
## [15] S4Vectors_0.26.1            BiocGenerics_0.34.0        
## [17] data.table_1.13.0           ggplot2_3.3.2              
## [19] Seurat_3.2.0               
## 
## loaded via a namespace (and not attached):
##   [1] AnnotationHub_2.20.1          BiocFileCache_1.12.1         
##   [3] plyr_1.8.6                    igraph_1.2.5                 
##   [5] lazyeval_0.2.2                splines_4.0.2                
##   [7] BiocParallel_1.22.0           listenv_0.8.0                
##   [9] digest_0.6.25                 htmltools_0.5.0              
##  [11] magrittr_1.5                  memoise_1.1.0                
##  [13] tensor_1.5                    cluster_2.1.0                
##  [15] ROCR_1.0-11                   globals_0.12.5               
##  [17] colorspace_1.4-1              blob_1.2.1                   
##  [19] rappdirs_0.3.1                ggrepel_0.8.2                
##  [21] xfun_0.16                     crayon_1.3.4                 
##  [23] RCurl_1.98-1.2                jsonlite_1.7.0               
##  [25] spatstat_1.64-1               spatstat.data_1.4-3          
##  [27] survival_3.2-3                zoo_1.8-8                    
##  [29] ape_5.4-1                     glue_1.4.1                   
##  [31] polyclip_1.10-0               gtable_0.3.0                 
##  [33] zlibbioc_1.34.0               XVector_0.28.0               
##  [35] leiden_0.3.3                  BiocSingular_1.4.0           
##  [37] future.apply_1.6.0            abind_1.4-5                  
##  [39] scales_1.1.1                  DBI_1.1.0                    
##  [41] miniUI_0.1.1.1                Rcpp_1.0.5                   
##  [43] viridisLite_0.3.0             xtable_1.8-4                 
##  [45] reticulate_1.16               bit_4.0.4                    
##  [47] rsvd_1.0.3                    htmlwidgets_1.5.1            
##  [49] httr_1.4.2                    RColorBrewer_1.1-2           
##  [51] ellipsis_0.3.1                ica_1.0-2                    
##  [53] pkgconfig_2.0.3               uwot_0.1.8                   
##  [55] dbplyr_1.4.4                  deldir_0.1-28                
##  [57] tidyselect_1.1.0              rlang_0.4.7                  
##  [59] reshape2_1.4.4                later_1.1.0.1                
##  [61] AnnotationDbi_1.50.3          munsell_0.5.0                
##  [63] BiocVersion_3.11.1            tools_4.0.2                  
##  [65] generics_0.0.2                RSQLite_2.2.0                
##  [67] ExperimentHub_1.14.1          ggridges_0.5.2               
##  [69] evaluate_0.14                 stringr_1.4.0                
##  [71] fastmap_1.0.1                 yaml_2.2.1                   
##  [73] goftest_1.2-2                 knitr_1.29                   
##  [75] bit64_4.0.2                   fitdistrplus_1.1-1           
##  [77] purrr_0.3.4                   RANN_2.6.1                   
##  [79] pbapply_1.4-3                 future_1.18.0                
##  [81] nlme_3.1-148                  mime_0.9                     
##  [83] compiler_4.0.2                plotly_4.9.2.1               
##  [85] curl_4.3                      png_0.1-7                    
##  [87] interactiveDisplayBase_1.26.3 spatstat.utils_1.17-0        
##  [89] tibble_3.0.3                  stringi_1.4.6                
##  [91] lattice_0.20-41               Matrix_1.2-18                
##  [93] vctrs_0.3.2                   pillar_1.4.6                 
##  [95] lifecycle_0.2.0               BiocManager_1.30.10          
##  [97] lmtest_0.9-37                 RcppAnnoy_0.0.16             
##  [99] BiocNeighbors_1.6.0           cowplot_1.0.0                
## [101] bitops_1.0-6                  irlba_2.3.3                  
## [103] httpuv_1.5.4                  patchwork_1.0.1              
## [105] R6_2.4.1                      promises_1.1.1               
## [107] KernSmooth_2.23-17            gridExtra_2.3                
## [109] codetools_0.2-16              MASS_7.3-52                  
## [111] assertthat_0.2.1              withr_2.2.0                  
## [113] sctransform_0.2.1             GenomeInfoDbData_1.2.3       
## [115] mgcv_1.8-31                   grid_4.0.2                   
## [117] rpart_4.1-15                  rmarkdown_2.3                
## [119] DelayedMatrixStats_1.10.1     Rtsne_0.15                   
## [121] shiny_1.5.0

Introduction

In v2 of the analysis we decided to include the control mice from the Nbeal experiment with the Migr1 and Mpl mice. The thought is that it may be good to have another control, since the Migr1 control has irradiated and had a bone marrow transplantation. I’m going to split the Rmarkdown files into separate part, to better organize my analysis.

This File

Here I will in include my efforts in to labeling the cell clusters in the integrated dataset. Planned analysis for labeling cell clusters:

  1. SingleR analysis: using both reference datasets; doing centroid analysis, and looking at individual cells
  2. Marker gene expression: looking at cell type specific marker genes
  3. Comparing to labeled human single cell whole bone marrow

SingleR Analysis

Following along with previous analyses I’ve done and also the SingleR documentation.

Cluster Labels

##    Cluster SingleR-comb SingleR-ref1 SingleR-ref2
## 1        0  Neutrophils  Neutrophils Granulocytes
## 2        1 Granulocytes  Neutrophils Granulocytes
## 3        2   Stem cells   Stem Cells Granulocytes
## 4        3      B cells      B cells      B cells
## 5        4  Neutrophils  Neutrophils Granulocytes
## 6        5    Monocytes    Monocytes    Monocytes
## 7        6    Basophils    Basophils Granulocytes
## 8        7   Stem cells   Stem Cells    Monocytes
## 9        8  Macrophages  Macrophages  Macrophages
## 10       9      B cells      B cells      B cells
## 11      10 Erythrocytes      B cells Erythrocytes
## 12      11      T cells      T cells      T cells
## 13      12   Stem cells   Stem Cells Erythrocytes
## 14      13      B cells      B cells      B cells
## 15      14         <NA>      B cells      B cells

The above image shows how best to interpret the cell prediction. Clusters (columns) that have dark red rectangles, or have shaded rectangles featuring similar cell types (ie Neutrophils/Granulocytes) lend confidence to cluster labels. Clusters like 12 show many different labels, with NA cell types, shows a lower confidence in cluster labels.

##    Cluster SingleR-comb SingleR-ref1 SingleR-ref2 SingleR_cell_comb
## 1        0  Neutrophils  Neutrophils Granulocytes        Neutrophil
## 2        1 Granulocytes  Neutrophils Granulocytes      Granulocytes
## 3        2   Stem cells   Stem Cells Granulocytes        Stem Cells
## 4        3      B cells      B cells      B cells            B cell
## 5        4  Neutrophils  Neutrophils Granulocytes        Neutrophil
## 6        5    Monocytes    Monocytes    Monocytes          Monocyte
## 7        6    Basophils    Basophils Granulocytes          Basophil
## 8        7   Stem cells   Stem Cells    Monocytes        Stem cells
## 9        8  Macrophages  Macrophages  Macrophages       Macrophages
## 10       9      B cells      B cells      B cells           B cells
## 11      10 Erythrocytes      B cells Erythrocytes      Erythrocytes
## 12      11      T cells      T cells      T cells           T cells
## 13      12   Stem cells   Stem Cells Erythrocytes        Stem Cells
## 14      13      B cells      B cells      B cells           B cells
## 15      14         <NA>      B cells      B cells                NA

Marker Gene Expression

Cell type specific marker gene expression. Genes were added to the list in two different ways: canonical markers that are well known in the field, and genes that distinguished clusters and were found to play a key role in specific cells.

Literature for Markers: Previously used

Ighd: immunoglobulin heavy constant delta. Seems to clearly be expressed by B-cells, but still working on a good reference.

Gata2: From Krause paper: a transcription factor required for both lineages but bind in different combinations ref

Cd68: a human macrophage marker ref. A more general ref

Vcam1: found papers using Vcam1+ monocytes, but haven’t found a great reference.

Alas2: an erythroid-specfiic 5-aminolevulinate synthase gene ref

Gata3: plays a role in the regulation of T-cells ref

Vwf and Itga2b: I figure the reference would best be left to y’all.

New Markers

Ly6g: from website it plays a role in monocyte, granulocyte, and neutrophil

Ngp: from uniprot “Expressed in myeloid bone marrow cells. Expressed in neutrophilic precursors (at protein level) (PubMed:8749713). Expressed in myeloid bone marrow cells (PubMed:21518852)”

Mmp8: neutrophil/lymphocyte collagenase link

## Warning: Could not find Cd38 in the default search locations, found in RNA assay
## instead
## Warning in FetchData(object = object, vars = features, cells = cells): The
## following requested variables were not found: SCA-1

##    Cluster SingleR-comb SingleR-ref1 SingleR-ref2 SingleR_cell_comb
## 1        0  Neutrophils  Neutrophils Granulocytes        Neutrophil
## 2        1 Granulocytes  Neutrophils Granulocytes      Granulocytes
## 3        2   Stem cells   Stem Cells Granulocytes        Stem Cells
## 4        3      B cells      B cells      B cells            B cell
## 5        4  Neutrophils  Neutrophils Granulocytes        Neutrophil
## 6        5    Monocytes    Monocytes    Monocytes          Monocyte
## 7        6    Basophils    Basophils Granulocytes          Basophil
## 8        7   Stem cells   Stem Cells    Monocytes        Stem cells
## 9        8  Macrophages  Macrophages  Macrophages       Macrophages
## 10       9      B cells      B cells      B cells           B cells
## 11      10 Erythrocytes      B cells Erythrocytes      Erythrocytes
## 12      11      T cells      T cells      T cells           T cells
## 13      12   Stem cells   Stem Cells Erythrocytes        Stem Cells
## 14      13      B cells      B cells      B cells           B cells
## 15      14         <NA>      B cells      B cells                NA
##                    markers
## 1              Granulocyte
## 2                       NA
## 3              Granulocyte
## 4                  B-cells
## 5                       NA
## 6                Monocytes
## 7            Mast Cell/MEP
## 8             Granulocyte?
## 9      Monocyte/Macrophage
## 10                 B-cells
## 11             Erythrocyte
## 12                 T-cells
## 13           Megakaryocyte
## 14 Lymphocyte/Stromal Cell
## 15             Plasma Cell

Comparison to Human Whole Bone Marrow

Need documentation: see slack notes

## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')

Still isn’t completely clear. The thought is too limit the number of genes used to make this correlation. For these figures it was using the 2,000 most highly variable genes in the mouse dataset, and finding human orthologs in the the human bone marrow dataset. Some human genes had multiple mouse orthologs, and we took the first one in the list. This left us with 1,471 genes in the above comparison.

Of the 500 most variable in both mouse and human scRNA datasets (each starting with 1,471 genes), 181 overlapped between the two sets.

## Warning in FindVariableFeatures.Assay(object = assay.data, selection.method =
## selection.method, : selection.method set to 'vst' but count slot is empty; will
## use data slot instead
## Warning in eval(predvars, data, env): NaNs produced
## Warning in hvf.info$variance.expected[not.const] <- 10^fit$fitted: number of
## items to replace is not a multiple of replacement length
##    Mode   FALSE    TRUE 
## logical     273     227

Now just looking at the 500 most variable genes in the mouse data set.

New Cell Type Labels

## Warning: Using `as.character()` on a quosure is deprecated as of rlang 0.3.0.
## Please use `as_label()` or `as_name()` instead.
## This warning is displayed once per session.

Distinguishing Genes

Looking at genes that distinguish certain clusters from others, to help identify the cell type. This was partially done during the marker gene stage, but will be done more thoroughly in this section.

Stem Cell Cluster

##           p_val avg_logFC pct.1 pct.2 p_val_adj
## Fcnb          0 1.7378054 0.953 0.324         0
## Chil3         0 1.6397866 0.995 0.619         0
## Hmgn2         0 1.4644989 0.997 0.796         0
## Hist1h2ap     0 1.2944567 0.874 0.355         0
## Ube2c         0 1.1577756 0.911 0.401         0
## Cebpe         0 1.1227218 0.997 0.498         0
## Mki67         0 1.0921853 0.991 0.394         0
## Top2a         0 1.0540932 0.950 0.350         0
## Birc5         0 0.9629361 0.963 0.375         0
## Tuba4a        0 0.9628674 0.992 0.642         0
##                p_val  avg_logFC pct.1 pct.2    p_val_adj
## Ctsg    1.209808e-15 -0.7837163 0.393 0.322 2.419616e-12
## C1qc    3.637988e-15 -0.6959099 0.157 0.248 7.275977e-12
## C1qa    2.291724e-14 -0.9117860 0.186 0.273 4.583448e-11
## Pf4     4.540254e-14 -1.2180444 0.260 0.346 9.080509e-11
## Stfa2l1 1.651299e-13 -0.8231477 0.283 0.282 3.302597e-10
## Ccl5    9.203515e-12 -0.7810595 0.104 0.172 1.840703e-08
## Akr1c18 2.194479e-10 -0.7730957 0.245 0.240 4.388958e-07
## Jchain  3.281011e-09 -1.4884194 0.070 0.126 6.562022e-06
## Gm5483  1.644833e-08 -1.0157618 0.212 0.244 3.289667e-05
## Igha    1.116925e-05 -1.3644366 0.082 0.123 2.233850e-02

Top 10 * Fcnb: a gene expressed in granulocytes * Chil3: eosinophil chemotactic cytokine * Cebpe: may be essential for terminal differentiation and functional maturation of committed granulocyte progenitor cells * Mki67: marker of proliferation * Birc5: apoptosis inhibitor

Down 10 Many genes associated with lymphocytes (Igha, Jchain)

Conclusion

I think once again these aren’t truly stem cells but a granulocyte progenitor

Progenitor Cluster

##        p_val avg_logFC pct.1 pct.2 p_val_adj
## Elane      0  3.760702 0.980 0.352         0
## Prtn3      0  3.579206 1.000 0.447         0
## Mpo        0  3.448474 0.975 0.272         0
## Ctsg       0  3.076317 0.989 0.302         0
## Ms4a3      0  2.429491 0.980 0.358         0
## Rgcc       0  1.572454 0.961 0.413         0
## Gstm1      0  1.415298 0.975 0.370         0
## Prss57     0  1.300410 0.992 0.269         0
## Dmkn       0  1.234107 0.907 0.163         0
## Med21      0  1.156593 0.994 0.485         0
##                 p_val  avg_logFC pct.1 pct.2    p_val_adj
## Akr1c18  3.394719e-09 -0.7809194 0.130 0.245 6.789439e-06
## Prss34   5.547182e-09 -1.7616139 0.177 0.285 1.109436e-05
## Vpreb1   1.187639e-08 -0.8921897 0.051 0.139 2.375278e-05
## BC100530 1.306323e-08 -0.9247701 0.144 0.265 2.612647e-05
## H2-Ab1   3.483371e-08 -0.9871756 0.254 0.282 6.966742e-05
## Cpa3     1.508041e-07 -0.7044522 0.059 0.157 3.016081e-04
## Jchain   3.591692e-07 -1.4574552 0.042 0.123 7.183384e-04
## Igha     1.711388e-06 -1.3442218 0.045 0.121 3.422775e-03
## Ccl5     5.920506e-06 -0.7380559 0.085 0.167 1.184101e-02
## Ifitm1   1.034447e-05 -1.1345575 0.411 0.462 2.068895e-02

Top 10 Markers * Elane: creates a protein called neutrophil elastase * Prtn3: expressed in neutrophil granulocytes * Mpo: active during myeloid differentiation that consitutes the major component of neutrophil azurophilic granules. * Ctsg: found in azurophil granules of neutrophilic polymorphonuclear leukocytes.

Conclusion

A neutrophil cluster

Megakaryocyte

##                 p_val avg_logFC pct.1 pct.2     p_val_adj
## Gm15915 2.122751e-273 1.3437173 0.955 0.135 4.245502e-270
## Pf4     7.627217e-233 3.2489100 0.961 0.322 1.525443e-229
## Muc13   3.586159e-214 0.8234805 0.910 0.060 7.172318e-211
## Angpt1  5.369361e-200 0.7540279 0.933 0.067 1.073872e-196
## Meis1   8.788350e-196 0.7423735 0.899 0.076 1.757670e-192
## Mrvi1   1.523672e-184 0.8510940 0.927 0.107 3.047344e-181
## Ybx3    3.207742e-179 1.3989186 1.000 0.395 6.415484e-176
## Ncl     8.287229e-179 1.6625501 1.000 0.660 1.657446e-175
## Rab27b  7.655640e-166 0.7732915 0.826 0.042 1.531128e-162
## Srm     2.739026e-164 1.1343476 0.989 0.254 5.478051e-161
##                p_val  avg_logFC pct.1 pct.2    p_val_adj
## Hbb-bt  2.793611e-14 -1.5069564 0.994 0.816 5.587222e-11
## Vpreb3  4.138232e-13 -2.1056835 0.096 0.261 8.276463e-10
## Gm5483  9.865494e-13 -1.2155142 0.416 0.237 1.973099e-09
## Pou2af1 9.986183e-13 -0.7748434 0.067 0.221 1.997237e-09
## Ly6d    3.939269e-09 -1.9298919 0.185 0.262 7.878538e-06
## H2-Aa   1.507854e-08 -0.8936143 0.343 0.213 3.015708e-05
## Iglc2   8.097041e-08 -1.8918088 0.146 0.138 1.619408e-04
## S100a4  2.239373e-07 -0.7541575 0.320 0.221 4.478746e-04
## Hbb-bs  2.460886e-07 -2.6505197 0.994 0.980 4.921772e-04
## Cd74    5.042619e-07 -2.0019411 0.270 0.325 1.008524e-03

Top 10 Markers * Pf4: Mks regulate the quiescence of HSCs through PF4 link * Mrv1: similar to Jaw1, a lymphoid-restricted protein whose expression is downregulated during myeloid differentiation, may be a myeloid leukemia tumor suppressor gene

Conclusion

This top 10 list isn’t very informative

MEP/MAST

##         p_val avg_logFC pct.1 pct.2 p_val_adj
## Mcpt8       0  4.979810 0.990 0.374         0
## Prss34      0  4.306974 0.958 0.231         0
## Ccl4        0  3.566765 0.892 0.271         0
## Ccl3        0  3.296363 0.973 0.284         0
## Ccl9        0  3.149800 0.980 0.289         0
## Ccl6        0  3.129451 0.970 0.654         0
## Cyp11a1     0  2.966787 0.997 0.213         0
## Ifitm1      0  2.915877 0.976 0.422         0
## Akr1c18     0  2.819709 0.809 0.199         0
## Cpa3        0  2.715900 0.770 0.107         0
##                 p_val  avg_logFC pct.1 pct.2    p_val_adj
## Vpreb1   8.391170e-23 -0.9112661 0.024 0.144 1.678234e-19
## Dnajc7   1.476353e-20 -1.2045237 0.508 0.593 2.952707e-17
## Jchain   3.277613e-20 -1.4765529 0.019 0.127 6.555227e-17
## Igha     3.366733e-18 -1.3640588 0.022 0.125 6.733465e-15
## C1qb     2.133267e-17 -0.7737691 0.118 0.268 4.266535e-14
## Pafah1b3 2.070846e-15 -0.8540048 0.312 0.349 4.141693e-12
## Apoe     4.356531e-15 -1.1461806 0.204 0.362 8.713062e-12
## S100a4   6.416181e-15 -0.7703960 0.113 0.231 1.283236e-11
## H2-Ab1   6.769553e-09 -0.9540405 0.220 0.286 1.353911e-05
## Ccl5     2.984116e-07 -0.7035290 0.090 0.169 5.968232e-04

Top 10 Genes * Mcpt8, Prss34, Cpa3: mast cell genes * Ccl: many chemokine ligand genes

###Conclusions

Seems like these are mast cells

MEP/MK/Ery Markers

Looking at the markers I used in “MEP Cluster Introspection” to maybe clarify some clusters.

VWF Markers

MK Markers

MEP Markers in “MEP” Cluster

Looking for MEP markers. Used this paper I picked genes that were expressed in in all there of their MEPs (Kit, Myb, Tgfb1, Cd44). They do a lot looking at MEP subtypes, and once we finalize the MEP cluster/population this would be interesting to look into.

Primative MEP Markers

## Warning: Could not find Cd44 in the default search locations, found in RNA assay
## instead

Ery MEP Markers

## Warning: Could not find Cnrip1 in the default search locations, found in RNA
## assay instead

MK Mep Markers

## Warning: Could not find Cd9 in the default search locations, found in RNA assay
## instead
## Warning: Could not find Lox in the default search locations, found in RNA assay
## instead

## Warning: Could not find Nfib in the default search locations, found in RNA assay
## instead

MEP Markers

## Warning: Could not find Dhrs3 in the default search locations, found in RNA
## assay instead

Ery/MK MEP Markers

More Genes

Looking at the Amit paper from 2015, for transciprtion factors which define one population from another.

TF Erythroid

## Warning: Could not find Mbd2 in the default search locations, found in RNA assay
## instead

## Warning: Could not find Tcf3 in the default search locations, found in RNA assay
## instead

TF MK

## Warning: Could not find Cited2 in the default search locations, found in RNA
## assay instead
## Warning: Could not find Fli1 in the default search locations, found in RNA assay
## instead

TF Basophil

## Warning: Could not find Lmo4 in the default search locations, found in RNA assay
## instead

TF Eosinophil

TF Neutrophil

## Warning: Could not find Gfi1 in the default search locations, found in RNA assay
## instead

TF Monocyte

TF Dendritic Cell